766 research outputs found
Displayed Categories
We introduce and develop the notion of *displayed categories*.
A displayed category over a category C is equivalent to "a category D and
functor F : D --> C", but instead of having a single collection of "objects of
D" with a map to the objects of C, the objects are given as a family indexed by
objects of C, and similarly for the morphisms. This encapsulates a common way
of building categories in practice, by starting with an existing category and
adding extra data/properties to the objects and morphisms.
The interest of this seemingly trivial reformulation is that various
properties of functors are more naturally defined as properties of the
corresponding displayed categories. Grothendieck fibrations, for example, when
defined as certain functors, use equality on objects in their definition. When
defined instead as certain displayed categories, no reference to equality on
objects is required. Moreover, almost all examples of fibrations in nature are,
in fact, categories whose standard construction can be seen as going via
displayed categories.
We therefore propose displayed categories as a basis for the development of
fibrations in the type-theoretic setting, and similarly for various other
notions whose classical definitions involve equality on objects.
Besides giving a conceptual clarification of such issues, displayed
categories also provide a powerful tool in computer formalisation, unifying and
abstracting common constructions and proof techniques of category theory, and
enabling modular reasoning about categories of multi-component structures. As
such, most of the material of this article has been formalised in Coq over the
UniMath library, with the aim of providing a practical library for use in
further developments.Comment: v3: Revised and slightly expanded for publication in LMCS. Theorem
numbering change
Analysis of spatial and temporal dynamics of xylem refilling in Acer rubrum L. using magnetic resonance imaging.
We report results of an analysis of embolism formation and subsequent refilling observed in stems of Acer rubrum L. using magnetic resonance imaging (MRI). MRI is one of the very few techniques that can provide direct non-destructive observations of the water content within opaque biological materials at a micrometer resolution. Thus, it has been used to determine temporal dynamics and water distributions within xylem tissue. In this study, we found good agreement between MRI measures of pixel brightness to assess xylem liquid water content and the percent loss in hydraulic conductivity (PLC) in response to water stress (P50 values of 2.51 and 2.70 for MRI and PLC, respectively). These data provide strong support that pixel brightness is well correlated to PLC and can be used as a proxy of PLC even when single vessels cannot be resolved on the image. Pressure induced embolism in moderately stressed plants resulted in initial drop of pixel brightness. This drop was followed by brightness gain over 100 min following pressure application suggesting that plants can restore water content in stem after induced embolism. This recovery was limited only to current-year wood ring; older wood did not show signs of recovery within the length of experiment (16 h). In vivo MRI observations of the xylem of moderately stressed (~-0.5 MPa) A. rubrum stems revealed evidence of a spontaneous embolism formation followed by rapid refilling (~30 min). Spontaneous (not induced) embolism formation was observed only once, despite over 60 h of continuous MRI observations made on several plants. Thus this observation provide evidence for the presence of naturally occurring embolism-refilling cycle in A. rubrum, but it is impossible to infer any conclusions in relation to its frequency in nature
Sparse Tensor Transpositions
We present a new algorithm for transposing sparse tensors called Quesadilla.
The algorithm converts the sparse tensor data structure to a list of
coordinates and sorts it with a fast multi-pass radix algorithm that exploits
knowledge of the requested transposition and the tensors input partial
coordinate ordering to provably minimize the number of parallel partial sorting
passes. We evaluate both a serial and a parallel implementation of Quesadilla
on a set of 19 tensors from the FROSTT collection, a set of tensors taken from
scientific and data analytic applications. We compare Quesadilla and a
generalization, Top-2-sadilla to several state of the art approaches, including
the tensor transposition routine used in the SPLATT tensor factorization
library. In serial tests, Quesadilla was the best strategy for 60% of all
tensor and transposition combinations and improved over SPLATT by at least 19%
in half of the combinations. In parallel tests, at least one of Quesadilla or
Top-2-sadilla was the best strategy for 52% of all tensor and transposition
combinations.Comment: This work will be the subject of a brief announcement at the 32nd ACM
Symposium on Parallelism in Algorithms and Architectures (SPAA '20
Displayed Categories
We introduce and develop the notion of displayed categories.
A displayed category over a category C is equivalent to "a category D and functor F : D -> C", but instead of having a single collection of "objects of D" with a map to the objects of C, the objects are given as a family indexed by objects of C, and similarly for the morphisms. This encapsulates a common way of building categories in practice, by starting with an existing category and adding extra data/properties to the objects and morphisms.
The interest of this seemingly trivial reformulation is that various properties of functors are more naturally defined as properties of the corresponding displayed categories. Grothendieck fibrations, for example, when defined as certain functors, use equality on objects in their definition. When defined instead as certain displayed categories, no reference to equality on objects is required. Moreover, almost all examples of fibrations in nature are, in fact, categories whose standard construction can be seen as going via displayed categories.
We therefore propose displayed categories as a basis for the development of fibrations in the type-theoretic setting, and similarly for various other notions whose classical definitions involve equality on objects.
Besides giving a conceptual clarification of such issues, displayed categories also provide a powerful tool in computer formalisation, unifying and abstracting common constructions and proof techniques of category theory, and enabling modular reasoning about categories of multi-component structures. As such, most of the material of this article has been formalised in Coq over the UniMath library, with the aim of providing a practical library for use in further developments
On Optimal Partitioning For Sparse Matrices In Variable Block Row Format
The Variable Block Row (VBR) format is an influential blocked sparse matrix
format designed to represent shared sparsity structure between adjacent rows
and columns. VBR consists of groups of adjacent rows and columns, storing the
resulting blocks that contain nonzeros in a dense format. This reduces the
memory footprint and enables optimizations such as register blocking and
instruction-level parallelism. Existing approaches use heuristics to determine
which rows and columns should be grouped together. We adapt and optimize a
dynamic programming algorithm for sequential hypergraph partitioning to produce
a linear time algorithm which can determine the optimal partition of rows under
an expressive cost model, assuming the column partition remains fixed.
Furthermore, we show that the problem of determining an optimal partition for
the rows and columns simultaneously is NP-Hard under a simple linear cost
model.
To evaluate our algorithm empirically against existing heuristics, we
introduce the 1D-VBR format, a specialization of VBR format where columns are
left ungrouped. We evaluate our algorithms on all 1626 real-valued matrices in
the SuiteSparse Matrix Collection. When asked to minimize an empirically
derived cost model for a sparse matrix-vector multiplication kernel, our
algorithm produced partitions whose 1D-VBR realizations achieve a speedup of at
least 1.18 over an unblocked kernel on 25% of the matrices, and a speedup of at
least 1.59 on 12.5% of the matrices. The 1D-VBR representation produced by our
algorithm had faster SpMVs than the 1D-VBR representations produced by any
existing heuristics on 87.8% of the test matrices
An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats
Tensors, linear-algebraic extensions of matrices in arbitrary dimensions, have numerous applications in computer science and computational science. Many tensors are sparse, containing more than 90% zero entries. Efficient algorithms can leverage sparsity to do less work, but the irregular locations of the nonzero entries pose challenges to performance engineers. Many tensor operations such as tensor-vector multiplications can be sped up substantially by breaking the tensor into equally sized blocks (only storing blocks which contain nonzeros) and performing operations in each block using carefully tuned code. However, selecting the best block size is computationally challenging. Previously, Vuduc et al. defined the fill of a sparse tensor to be the number of stored entries in the blocked format divided by the number of nonzero entries, and showed that the fill can be used as an effective heuristic to choose a good block size. However, they gave no accuracy bounds for their method for estimating the fill, and it is vulnerable to adversarial examples. In this paper, we present a sampling-based method for finding a (1 + epsilon)-approximation to the fill of an order N tensor for all block sizes less than B, with probability at least 1 - delta, using O(B^(2N) log(B^N / delta) / epsilon^2) samples for each block size. We introduce an efficient routine to sample for all B^N block sizes at once in O(N B^N) time. We extend our concentration bounds to a more efficient bound based on sampling without replacement, using the recent Hoeffding-Serfling inequality. We then implement our algorithm and compare our scheme to that of Vuduc, as implemented in the Optimized Sparse Kernel Interface (OSKI) library. We find that our algorithm provides faster estimates of the fill at all accuracy levels, providing evidence that this is both a theoretical and practical improvement. Our code is available under the BSD 3-clause license at https://github.com/peterahrens/FillEstimation
Shock compression of single-crystal forsterite
Dynamic compression results are reported for single-crystal forsterite loaded along the orthorhombic a and c axes to pressures from 130 to 165 GPa. Hugoniot states for the two axes are well described by a single curve offset to densities 0.15–0.20 g/cm^3 lower than earlier data for single-crystal forsterite shocked along the b axis above 100 GPa. Earlier data of Syono et al. [1981a] show marginal support for similar b-axis behavior in the mixed-phase region from 50 to 92 GPa. Thus shocked forsterite is most compressible in the b direction for the mixed-phase and high-pressure regimes (P > 50 GPa). These data represent the highest pressures for which shock properties have been observed to depend on crystal orientation. Theoretical Hugoniots for mixed-oxide and perovskite-structure high-pressure assemblages of forsterite calculated from recent experimental data are virtually identical and agree with the b-axis data. The a- and c-axis data are also consistent with both high-pressure assemblages because uncertainties in equation of state parameters produce a broad range of computed Hugoniots. Our calculated “average” Hugoniot is up to 0.13 g/cm^3 less dense than the preferred theoretical Hugoniots, in agreement with earlier measurements on dense polycrystalline forsterite. Interpolation between the single-crystal forsterite Hugoniots and Hugoniots for fayalite and Fo_(45) gives Fo_(88) Hugoniots bracketing Twin Sisters dunite data not previously well fit by systematics. Release paths are steep for the a and b axes but c-axis release paths are much shallower. Hugoniot elastic limits measured for the a and b axes are in good agreement with previous data of Syono et al.; however, the present data for the a axis reveal a triple wave structure: two deformational shock waves as well as the elastic shock, a feature not previously found. The second shock, with amplitude about 9 GPa and a shock temperature of about 350°K, could perhaps be explained by the forsterite α→β or γ phase transformation
Shock wave equations of state using mixed-phase regime data
A method is given that uses Hugoniot data in the mixed-phase regime to constrain further equation of state (EOS) parameters of low- and high-pressure phases of materials under-going phase transformations on shock loading. We compute the relative proportion of low- and high-pressure phases present in the mixed-phase region and apply additional tests to the EOS parameters of the separate low- and high-pressure phases by invoking two simple requirements: the fraction of high-pressure phase (1) must increase with increasing shock pressure, and (2) must approach one at the high-pressure end of the mixed-phase regime. We apply our analysis to previously published data for potassium thioferrite, KfeS_2, and pyrrhotite, Fe_(0.9)S. We find that including the mixed-phase regime data in the KfeS_2 analysis requires no change in the published high-pressure EOS parameters. For Fe_(0.9)S we must modify the high-pressure phase EOS parameters to account for both the mixed-phase and high-pressure phase Hugoniot data. Our values of zero-pressure density, bulk modulus and first pressure derivative of the bulk modulus of the high-pressure phase of Fe_(0.9)S are 5.3 Mg/m^3, 106 GPa, and 4.9, respectively
LATE Ain'T Earley: A Faster Parallel Earley Parser
We present the LATE algorithm, an asynchronous variant of the Earley
algorithm for parsing context-free grammars. The Earley algorithm is naturally
task-based, but is difficult to parallelize because of dependencies between the
tasks. We present the LATE algorithm, which uses additional data structures to
maintain information about the state of the parse so that work items may be
processed in any order. This property allows the LATE algorithm to be sped up
using task parallelism. We show that the LATE algorithm can achieve a 120x
speedup over the Earley algorithm on a natural language task
- …